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Abstract 

Reverse nearest neighbor queries are defined as follows: Given an input point-set P, and a query 
point q, find all the points pin P whose nearest point in P U {g} \ {p} is q. We give a data structure 
to answer reverse nearest neighbor queries in fixed-dimensional EucUdean space. Our data structure 
uses 0(n) space, its preprocessing time is 0{n log n), and its query time is 0(log n). 

1 Introduction 

Given a set P of n points in W^, a well-known problem in computational geometry is nearest neighbor 
searching: preprocess P such that, for any query point q, a point in P that is closest to q can be reported 
efficiently. This problem has been studied extensively; in this paper, we consider the related problem of 
reverse nearest neighbor searching, which has attracted some attention recently. 

The reverse nearest neighbor seaixhing problem is the following. Given a query point q, we want 
to report all the points in P that have q as one of their nearest neighbors. More formally, we want to 
find the points p ^ P such that for all points p' £ P \ {p}, the distance pp' is larger or equal to the 
distance pq. 

The earliest work on reverse nearest neighbor searching is by Korn and Mutukrishnan ifTTTl . They 
motivate this problem by applications in databases, where for instance, one would like to know which 
customers are affected by opening a new shop at a given location. Their approach is based on R-Trees, so 
it is unlikely to give a good worst-case time bound. Subsequently, the reverse neai^est neighbor seaixhing 
problem has attracted some attention in the database community |[3l [T2l[T5l[T6l[T8l[T9ll20l . 

The only previous work on reverse nearest neighbor searching where worst-case time bounds are 
given is the work by Maheshwari et al. fT3l. They give a data structure for the two-dimensional case, 
using 0{n) space, with O(nlogn) preprocessing time, and O(logn) query time. Their approach is the 
following: they show that the arrangement of the largest empty circles centered at data points has linear 
size, and then they answer queries by doing point location in this arrangement. 

In this paper, we extend the result of Maheshwari et al. |[T3l to arbitrary fixed dimension. We give 
a data structure for reverse nearest neighbor searching in W^, where d = 0(1), and using the Euclidean 
distance. Our data structure has size 0(n), with preprocessing time 0{nlogn), and with query time 
O(logn). 

It is perhaps surprising that we can match the bounds for the two-dimensional case in arbitrary fixed 
dimension. For nearest neighbor queries, this does not seem to be possible: The bounds for nearest 
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neighbor seaixhing in higher dimension depend on the complexity of the Voronoi diagram, which is 
0^j^rd/2]'j d-dimensional space. Intuitively, reverse nearest neighbor searching would seem to be a 
more difficult problem; Lin et al. |[T2|| . for instance, consider reverse nearest neighbor searching as being 
more computationally complicated than nearest neighbor searching. 

Our approach is similar to some previous work on approximate Voronoi diagrams |[2l |9l [TOl : the 
space is partitioned using a compressed quadtree, each cell of this quadtree containing a small set of 
candidate points. Queries are answered by finding the cell containing the query point, and checking all 
the candidate points in this cell. Interestingly, this approach allows to answer reverse nearest neighbor 
queries efficiently and exactly, while it only seems to give approximations for nearest neighbor seaixh- 
ing. 

Our model of computation is the real-RAM model with one additional operation: for a real number 
X > 0, we can find in constant time the integer i such that 2* ^ a; < 2'+^. In practice, this operation cor- 
responds to finding the most significant bit in an integer or floating point number, and it can be performed 
quickly using standard hardware. This operation is standard in compressed quadtree algorithms ||8l, as 
it allows to determine in constant time the smallest quadtree box containing two points. The work of 
Maheshwari et al. fOil uses the real-RAM model, without this extra operation. 

2 Compressed quadtrees 

In this section, we describe compressed quadtrees, a well known data structure in computational geom- 
etry. A more detailed presentation can be found in Har-Peled's lecture notes |)9l, or in the article on skip 
quadtrees by Eppstein, Goodrich, and Sun HI. We first describe quadtrees, and then we describe their 
compressed version. 

We consider quadtrees in W^, where d = 0(1). We denote by Hr the hypercube [—1, 1]*^; the leaves 
of a quadtree will form a partition of Hr- 

A quadtree box is either H^, or is obtained by partitioning a quadtree box H into 2'^ equal sized 
hypercubes — these hypercubes are called the quadrants of H. A quadtree is a data structure that stores 
quadtree boxes in a hierarchical manner. Each node v of n quadtree stores a quadtree box C{v), and 
pointers to its parent and its children. We call C{v) the cell of node v. In this paper, the cell of the root 
of a quadtree is always the box Hr- Each node v either is a leaf, or has 2'' children that store the 2'^ 
quadrants of C{u). With this definition, the cells of the leaves of a quadtree form a partition of Hr- 

Let S denote a set of m quadtree boxes. We can construct the smallest quadtree whose nodes store 
all boxes in S as follows. We start by constructing the root. If 5" C {Hj.}, then we are done. Otherwise, 
we construct the 2'^ children of the root. We consider the subset Si (Z S (resp. 5*2 , 5*3 . . . ) of the boxes 
in 5 contained in the first quadrant (resp. second, third,. . . ). We construct recursively the quadtree, by 
taking the first (resp. second, third, . . . ) child as the root and using the set of boxes Si (resp. ^2, ^3, . . . ). 

The above construction results in a quadtree that stores all the boxes in S. Even though it is the 
smallest such quadtree, its size can be arbitrarily large when S contains very small boxes. To remedy 
this, we use a compressed quadtree, which allows to bypass long chains of internal nodes. 

In order to reduce the size of the data structure, we allow two different kinds of nodes in a compressed 
quadtree. An ordinary node stores a quadtree box as before. A compressed node u, however, stores 
the difference H \ H' of two quadtree boxes H and H' . We still call this difference the cell C{u). 
Compressed nodes are always leaves of the compressed quadtree. 

As in a quadtree, the cells of the children of a node u form a partition of C{v). But two cases are 
now possible: either these cells are the quadrants of C{v), or v has two children, one of them storing a 
quadtree box H C C{v), and the other storing C^u) \ H. 

The construction of a compressed quadtree that stores all the boxes in S is analogous to the construc- 
tion of the ordinary quadtree, with the following difference. Assume we are processing an internal node 
u. Let H denote the smallest quadtree box containing the boxes in S that are strictly contained in C{u). 
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If H = C{v), then we proceed exactly as we did for the ordinary quadtree: we construct 2'^ children 
corresponding to the quadrants of C{u). Otherwise, u has two children, one stores H, and the other is a 
compressed node that stores C(z^) \ H. Intuitively, this construction of a compressed node allows us to 
"zoom in" when all the boxes in C{u) are within a small area, and avoids a long chain of internal nodes. 
(See Figure [T]) 























ffl 





Figure 1 : (a) A set of quadtree boxes, (b) The quadtree storing these quadtree boxes, (c) The compressed 
quadtree storing the same set of quadtree boxes. The two cells on the left side correspond to compressed 
nodes. 



The following results on the size, construction time, and query time of compressed quadtrees are 
well known; proofs can be found in Har-Peled's lecture notes ||9l or in Eppstein et al.'s article IH. 

Lemma 1 Let S be a set ofm quadtree boxes contained in Hr- We can construct in time 0{m log m) a 
compressed quadtree T, such that each box in S is the cell of a node ofT. This compressed quadtree T 
has size 0{m). After O(mlogm) preprocessing time, we can find for any query point q G Hr the leaf 
ofT whose cell contains q in time 0(log m). 

Note that a query point might lie on the boundaries of several cells. In this case, we break the tie 
arbitrarily, and we return only one cell containing q. 

3 Data structure for reverse nearest neighbor queries 

In this section, we describe the construction of our data structure and how we answer reverse nearest 
neighbor queries. This data structure is a compressed quadtree, with a set of candidate points stored at 
each node. To answer a query, we locate the leaf of the compressed quadtree whose cell contains the 
query point, and we check all the candidate points in this leaf; the reverse nearest-neighbors are among 
these points. We start with some notation. 

Our input point set is denoted by P = {pi, . . . ,Pn}, with n ^ 2. We still work in W^, where 
d = 0(1), and so P C M'^. The empty ball hi is the largest ball centered at pi G P that does not contain 
any other point of P in its interior. In other words, the boundary of the empty ball centered at p goes 
through the nearest point to pin P\ {p}. In this paper, we only consider closed balls, so pi is a reverse 
nearest neighbor of a query point q if and only if g G 6j. 

Let Hp be a smallest axis-aligned (i-dimensional hypercube containing the input point set P. With- 
out loss of generality, we assume that Hp = [— l/2\/d, l/2\/d]'^; then any empty ball is contained in 
Hr = [-1, l]'^. When u is an ordinary node, we denote by s(i^) the side length of the quadtree box C(i^). 
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We first compute the set of all the largest empty balls for P. This can be done in 0{n log n) time us- 
ing Vaidya's all-nearest neighbors algorithm ifTTl . We denote by rj the radius of bi. For each bi, we com- 
pute the quadtree boxes with side length in [2rj, 4rj) that overlap hi. Under our model of computation, 
it can be done in 0(1) time. There ai^e at most 2"^ such boxes; we denote them hy hj,j G {1, . . . , 2'^}. 

Using Lemma [U we construct in 0(n log ?i) time a compressed quadtree T of size 0{n) such that 
each box hj appears in T. For each node u of T, if the corresponding cell C{ij) is hj, we store pi as a 
candidate point for v. Storing these candidate points can be done during the construction of the quadtree 
within the same time bound. Notice that we may store several candidate points for a given node u. 

These sets of candidate points are not sufficient for our purpose, so we will add some other points. 
For each ordinary (non-compressed) node u, we store the points pi such that > s{i')/4 and bi overlaps 
C{v); this list of candidate points is denoted by In order to analyze our algorithm, we need the 

following lemma, which is proved in Section |4l 

Lemma 2 For any ordinary node u, the cardinality of the set of candidate points C{v) stored at v 
is 0(1). 

We construct the lists £(•) by traversing T recursively, starting from the root. Assume that v is the 
cun^ent ordinary node. The points pi such that h\ = C{v) for some j have already been stored at v. By 
our construction, they are the points pi in C{v) such that s(z^)/4 < rj ^ s{v) /2. So we need the other 
candidate points pk, such that > s{v) /2. These points can be found in C{v'), where v' is the parent 
of V. So we insert in C{v) all the points pk G -C(z>') such that b^ overlaps C{v), which completes the 
construction of C{v). By Lemma|2l this can be done in 0(1) time per node, and thus overall, computing 
the lists of candidate points for ordinary nodes takes 0{n) time. 

If is a compressed node, and v' is its parent, we just set C{v) = C{v'). We complete the construc- 
tion of our data structure by handling all the compressed nodes. 

Given a query point q, we answer reverse nearest-neighbor queries as follows. If g ^ Hj., then 
we return 0, because we saw earlier that all empty balls are in Hj.. Otherwise, we find the leaf v such 
that q G C{v), which can be done in O(logn) time by Lemma[T] For each point pi G we check 

whether pi is a reverse nearest neighbor, that is, we check whether q G b^. If this is the case, we report pi. 

We still need to argue that we answered the query correctly. Assume that is a reverse nearest 
neighbor of q, and the leaf u containing q is an ordinary node. As g G bk, we have <? G for some j, 
and since the side length of /i^ is less than 4rfc, we have s(i/) < Ar^. Since bk contains q, it overlaps 
C{v), so by definition of C{v), we have p^ G C{u), and thus p^ was reported. If is a compressed node, 
then the same proof works if we replace v by its pai^ent v' , since C{u) = C{v'). 

The discussion above proves the main result of this paper: 

Theorem 3 Let P be a set of n points in W^. We assume that d = 0(1). Then we can construct in 
time 0{n log n) a data structure of size 0{n) that answers reverse nearest-neighbor queries in 0(log n) 
time. The number of reverse nearest neighbors is 0(1). 

The fact that the number of reverse nearest neighbors is 0(1) was known: it follows from the fact that 
the maximum degree of nearest neighbor graphs in fixed dimension is constant. See for instance the 
article by Miller et al. llT4ll . It can also be shown directly by a simple packing argument. 

4 Proof of Lemma |2] 

In this section, we prove Lemma |2l which was needed to establish the time bounds in Theorem [3l We 
start with a packing lemma. 

Lemma 4 Let b be a ball with radius r. Then b intersects at most 2 x 5*^ empty balls with radius larger 
or equal to r. 
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Proof: When x, y G W^, we denote by xy the Euclidean distance between x and y, and we denote by 
xy the line segment connecting them. 

We denote by c the center of b, and we denote by b' the ball with center c and radius 2r. We first 
bound the number of empty balls with radius ^ r whose center is contained in b'. Let B denote this set 
of balls, and let C denote the set of their centers. Any two points in C are at distance at least r from 
each other. Hence, the balls with radius r/2 and with centers in C are disjoint. As they are all contained 
in the ball b" with center c and radius 5r/2, the sum of their volumes is at most the volume of b". Hence, 
we have |C| ^ 5'^, and thus \B\ < 5'^. 

We now consider the empty balls with radius ^ r that intersect b, and whose centers ai^e not in b'. 
We denote by B' the set of these balls, and we denote by C the set of their centers. Let bi (resp. 62) be 
a ball in B' with radius ri (resp. r2) and center ci (resp. C2). (See Figure |2]) Without loss of generality. 




Figure 2: Proof of Lemma H] 



we assume that ri ^ r2. 

Let C2 be the point of cc2 such that cc^ = cc\. Let r'2 = r2 — C2C2, and let b'2 denote the ball 
with center C2 and radius r^. As b^ C 62, we know that b'^ does not contain c\ in its interior, and thus 
r'2 ^ ciCg. As b'2 intersects 6, we have CC2 ^ r + rg. It implies that CC2 — r ^ C1C2. Since ccg ^ 2r, it 
follows that CC2 ^ 2ciC2. 

Let c'/ (resp. C2) denote the projection of c\ (resp. C2) onto the unit sphere u centered at c. In other 
words, d[ = c + (l/cci)(ci — c). Then it follows from the previous paragraph that c"c2 5^ 1/2. Hence, 
the spheres with radius 1/4 and centered at the projections onto u of the points in C are disjoint. As 
these spheres are contained in the sphere with radius 5/4 centered at c, we have \C'\ ^ S'^, and thus 
\B'\ ^ 5"=^. □ 

Now we prove Lemma ID For any ordinary node v, the number of candidate points stored in C{u) 
is 0(1). We assume that pi G C^u). By definition, we must have rj > s{v)/A, and bi overlaps C{u). 
As C{v) can be covered by 0(1) balls with radius s(z/)/4. Lemma |4] implies that there can be only 0(1) 
such candidate points. 

5 Concluding remarks 

Our approach does not only give a data structure to answer reverse nearest neighbor queries, it also yields 
a reverse Voronoi diagram: a spatial subdivision with linear complexity such that, within each cell, the 
set of reverse nearest neighbors is fixed. To achieve this, we construct, within the cell of each leaf of our 
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quadtree, the arrangement of the empty balls of the candidate points. As there is only a constant number 
of candidates per cell, each such arrangement has constant complexity, so overall we get a subdivision 
of linear size. 

The query time of our data structure can be improved in the word RAM model of computation, when 
the coordinates of the input points are 0(log ri)-bits integers. In this case, the shuffle-and-sort approach 
of Chan H yields a query time 0(log log n). Under this model, the construction time of the compressed 
quadtree can also be improved Q to 0{m), but it does not affect the construction time of our data 
structure, which is dominated by the all-nearest neighbors computation. 

The most natural extension to this problem would be to handle different metrics. Our approach 
applies directly to any norm of M*^, with d = 0(1), as its unit ball can be made fat after changing the 
coordinate system. The time bounds and space usage remain the same. 

Another possible extension would be to make our algorithm dynamic. The main difficulty is that it 
seems that we would need to maintain the empty balls, which means maintaining all neai^est neighbors. 
The result of Maheshwari et al. lfT3l . combined with the data structure of Chan for dynamic nearest 
neighbors [6], gives poly logarithmic update time and query time in M?. In higher dimension, these 
bounds would be considerably worse, if one uses the best known data structures for dynamic nearest 
neighbors fflia. 

Finally, it would be interesting to find the dependency of our time bounds in the dimension d. We did 
not deal with this issue, because one would first have to find this dependency for constructing compressed 
quadtrees, which is not the focus of this paper. 
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